Search CORE

22 research outputs found

Thai Automatic Speech Recognition

Author: Black Alan W.
Charoenpornsawat Paisarn
Schultz Tanja
Suebvisai Sinaporn
Woszczyna Monika
Publication venue
Publication date: 16/06/2008
Field of study

KITopen

The CMU TransTac 2007 Eyes-free and Hands-free Two-way Speech-to-Speech Translation System

Author: Alex Waibel Alex
Bach Nguyen
Black Alan W.
Charoenpornsawat Paisarn
Eck Matthias
Hsiao Roger
Köhler Thilo
Nguyen ThuyLinh
Schultz Tanja
Stüker Sebastian
Vogel Stephan
Publication venue: Trento
Publication date: 01/01/2007
Field of study

KITopen

IMPROVING WORD SEGMENTATION FOR THAI SPEECH TRANSLATION

Author: Paisarn Charoenpornsawat
Tanja Schultz
Publication venue
Publication date: 01/01/2008
Field of study

A vocabulary list and language model are primary components in a speech translation system. Generating both from plain text is a straightforward task for English. However, it is quite challenging for Chinese, Japanese, or Thai which provide no word segmentation, i.e. the text has no word boundary delimiter. For Thai word segmentation, Maximal Matching, a lexicon-based approach, is one of the popular methods. Nevertheless this method heavily relies on the coverage of the lexicon. When text contains an unknown word, this method usually produces a wrong boundary. When extracting words from this segmented text, some words will not be retrieved because of wrong segmentation. In this paper, we propose statistical techniques to tackle this problem. Based on different word segmentation methods we develop various speech translation systems and show that the proposed method can significantly improve the translation accuracy by about 6.42 % BLEU points compared to the baseline system

CiteSeerX

Crossref

KITopen

Automatic Sentence Break Disambiguation for Thai

Author: Paisarn Charoenpornsawat
Virach Sornlertlamvanich
Publication venue
Publication date
Field of study

Unlike English, there is no explicit sentence marker in Thai language. Conventionally, a space is placed at the end of the sentence when written in Thai. But it does not mean that a space always indicates the sentence boundary. In this paper, we propose the algorithm, which is a feature-based approach, to extract sentences from a paragraph by detecting the appropriate sentence breaking spaces. The algorithm considers the context around a space for determining the space as whether a sentence breaking space or not. The previous method, probabilistic POS trigram approach, considers only the coarse information of part-of-speech in a limited range of context whereas the feature-based approach considers as many features as possible. A feature can be anything that examines a specific information in the context around the target word sequence, such as context words and collocations. To automatically extract such features from a training corpus, we employ the learning algorithm, namely Winnow. The experimental results showed the effectiveness of Winnow comparing with POS trigram, and also demonstrated that Winnow is superior to POS trigram in our task

CiteSeerX

Example-Based Grapheme-to-Phoneme Conversion for Thai

Author: Paisarn Charoenpornsawat
Tanja Schultz
Publication venue
Publication date
Field of study

Several characteristics of the Thai writing system make Thai grapheme-to-phoneme (G2P) conversion very challenging. In this paper, we propose an Example-Based Grapheme-to-Phoneme conversion approach. It generates the pronunciation of a word by selecting, modifying and combining pronunciations from syllables from training corpus. The best system achieves 80.99 % word accuracy and 94.19 % phone accuracy which significantly outperform previous approaches for Thai

CiteSeerX

Example-based Grapheme-to-Phoneme Conversion for Thai

Author: Charoenpornsawat Paisarn
Schultz Tanja
Publication venue
Publication date: 17/06/2008
Field of study

KITopen

Thai Grapheme-Based Speech Recognition

Author: Paisarn Charoenpornsawat
Sanjika Hewavitharana
Tanja Schultz
Publication venue
Publication date: 06/02/2008
Field of study

In this paper we present the results for building a grapheme-based speech recognition system for Thai. We experiment with different settings for the initial context independent system, different number of acoustic models and different contexts for the speech unit. In addition, we investigate the potential of an enhanced tree clustering method as a way of sharing parameters across models. We compare our system with two phoneme-based systems; one that uses a hand-crafted dictionary and another that uses an automatically generated dictionary. Experiment results show that the grapheme-based system with enhanced tree clustering outperforms the phoneme-based system using an automatically generated dictionary, and has comparable results to the phoneme-based system with the handcrafted dictionary.

CiteSeerX

KITopen

Spontaneous Thai Speech Recognition

Author: Charoenpornsawat Paisarn
Schultz Tanja
Woszczyna Monika
Publication venue
Publication date: 17/06/2008
Field of study

KITopen

A Context-Sensitive Homograph Disambiguation in Thai Text-to-Speech Synthesis

Author: Paisarn Charoenpornsawat
Virach Sornlertlamvanich
Virongrong Tesprasit
Publication venue
Publication date: 01/01/2003
Field of study

Homograph ambiguity is an original issue in Text-to-Speech (TTS). To disambiguate homograph, several efficient approaches have been proposed such as part-of-speech (POS) n-gram, Bayesian classifier, decision tree, and Bayesian-hybrid approaches. These methods need words or/and POS tags surrounding the question homographs in disambiguation. Some languages such as Thai, Chinese, and Japanese have no word-boundary delimiter. Therefore before solving homograph ambiguity, we need to identify word boundaries. In this paper, we propose a unique framework that solves both word segmentation and homograph ambiguity problems altogether. Our model employs both local and longdistance contexts, which are automatically extracted by a machine learning technique called Winnow.

CiteSeerX

Crossref

Feature-based Thai Word Segmentation

Author: Boonserm Kijsirikul
Paisarn Charoenpornsawat
Surapant Meknavin
Publication venue
Publication date: 01/01/1997
Field of study

Word segmentation is a problem in several Asian languages that have no explicit word boundary delimiter, e.g. Chinese, Japanese, Korean and Thai. We propose to use featurebased approaches for Thai word segmentation

CiteSeerX